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A General Framework for Pixel 
Classification 
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The ideas presented here represent an attempt to define a natural 
set of pixel categories which will be represented in a typical LWIDSAT 
scene and which we hope can be delineated with some success by the use 
of available spatial/spectral clustering algorithms. The pixel categories 
and their characteristics arc: 



P - The set of "pure" pixels; i.e.» pixels from within fields, 
these are characterized by a high degree of local spectral 
homogeneity; that is, elements of P have adjacent pixels 
which look spectrally alike. 

Tj- The set of "trash" pixels. These pixels do not have homogeneous 
spatial neighborhoods and are relatively distant, spectrally, from 
the set P. 


B - The set of boundary pixels, pixels at the common boundaries of 
adjacent fields. Elements of B have spatial neighbors in P 
and no spatial neighbors in the set T^. 

T^- All other pixels. These have no pure neighbors or else have 
neighbors in the class Tj, thus the spatial information is 
ambiguous. However, elements of T^ are relatively near, spectrally, to 
the pure pixels. 


Obviously if these four categories can be identified, they require 
different means of processing to extract estimates of the acreages of the 
real classes. T^ will not be processed at all, since there is neither 
spatial nor spectral evidence that it consists of agriculture. The 
processing of T2, if it occurs, will rely almost wholly on spectral 
measurements, since the spatial information is ambiguous for elements of T^. 
B consists of pixels whose spectral response can be properly regarded as 
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a mixture of pure pixel responses and a fairly detailed proposal for handling 
B follows. 

There Is considerable doubt about whether a normal mixture model with 
mixing proportions easily related to "real" class acreage proportions Is 
valid for LANDSAT agricultural data. It seems clearly Inappropriate for 
categories and B and possibly appropriate for Tg and P. There are 
objections to applying It to P. First, the reason for preferring the density 
estimation approach to the clustering and counting approach, namely that the 
proportion estimates are unbiased, may be invalid for P because the spectral 
observations are far frwn independent. Second, the proportion estimates are 
meaningless unless the component densities of the mixture are related to 
real classes. This means the field structure of P must be respected by the 
estimator. Meeting the first objection requires that the dependence between 
nearby pixels be somehow modeled. The obvious (but probably not adequate) 
solution to the second problem Is to use the clusters generated In P by 
a spatial /spectral clustering algorithm which preserves the Integrity of • 
fields to Initialize the parameters In a maximum likelihood algorithm for the 
normal mixture distribution. For example, the algorithm AMOEBA, after 
determining the best clustering of some test data, then assigns whole fields 
to single clusters by a nearest cluster center classification of the field 
means. It should be noted that In terms of the assianptlons underlying AMOEBaI 
It Is senseless to graft the familiar maximum likelihood procedures UHMLE 
and CLASSY to AMOEBA In exactly the naive way just suggested. Indeed, they 
are based on the wrong likelihood function for the kind of partitioned 
sample we are considering with P. 

In processing B, the boundary pixels, we suggest that the following 
procedure should be considered. We assume that the set P of pure pixels 
has been classified, so that a class label 1(r)C { 1,. . . ,m{ Is assigned 
to each pixel rcP. 
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Given a pixel r In the scene, let x(r) denote Its vector of spectral 
measurements. For r »• B let 

P(r) = (s c P I s Is a neighbor of r). 
and 


C(r) ■ (Ks) I s c P(r)). 


Thus C(r) Is the set of class labels of pure spatial neighbors of r. 

Let C be a set of r. 4 (or e 8) of the class labels fl, ...» m). Define 

B » tr c B I C(r) » C). 

V 

Thus Is the set of boundary pixels whose pure spatial neighbors have 
exactly those classifications listed in C. For acreage estimation, we 
treat each set B separately and then combine the estimates to get an 
acreage estimate for B. For slmpHcltly we suppose that C - {1,21. 

The generality of the discussion will be obvious. If r <- then r has 
pure neighbors in classes 1 and 2 only. (Recall that r may have impure 
neighbors, but none of them belong to the trash class Tj.) Let 

Pj(r) = (s t P(r) I 1(s) = 1} 

P^tr) = (s £ P(r) 1 1(s) = 2) 
and 

Pl(0,)-U P,(r) 

'"‘“c 


Pjdlj) » P,(r) 


r<B. 


The following are our assumptions about the spectral measurecnents of 

elements of B . Let r be an arbitrary element of B . 

^ c 

1) For each s i P(r), a fraction B(s,r) of tfie area of pixel r 
has the same reflectance properties as s. The spectral response 
from r can be written as 
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x(r) -- «j(r)x,(r) + Kj,{f)xp(r ) + . (r). 


where 


i'.j(r) 

Xj(r) - 


s.P^(r) 

- }: l<(i.r) 

SfP^lr) 

); isTrl X (s) 

s.Pj(r) 


[^(s.r) 

x,(r) = >: *BpTrT X (s) 

and I is an error term whose expectation Is 0. 

2) hj and X|^ arc uncorre1ate<i as are Bj, and % 2 ' 

3) E(x,(r) I r , B^l Llx{s) 1 s . Pj(0^)l j = K?. 

If assumptions (1) - (3) .ir(’ valid then 

(M Efx(r) ! r - I = r:Bj{r) | r . B^.IE[ x(s) I s . 

+ El B 2 (r) I r . B^IECx(s) ! s . p 2 (B^)l 

The numbers Erp.(r) | r . B I are easily related to the acreages of classes 
J c 

1 and 2 in B . 

c 

In practice, we intend to estimate E[B^(r) | r c B 1 j = 1,2 as least 
squares solutions of (*). If any set B^ produces an unacceptably largo 
residual error we take that as an indication that the set B„ defined by 
the algorithm does not consist of boundary pixels. If many sets B pro- 
duce large residual errors, oven after experimenting with the tolerances 
implicit in the definitions of P.B.T^, and J 2 then we would tend to 
believe that the boundary pixel model of assumptions (1) - (3) is wrong. 



1. Introduction 




A possible objection to the use of UHNLE or CLASSY In conjunction with 
AMOEBA Is that both these algorithms Ignore the association of pixels In 
fields. Indeed AMOEBA Is based on the explicit assumption that pixels In 
the same field represent the same real class Cl]> while the assumptions 
underlying the maximum likelihood algorlthns Imply that the classification 
of a pixel Is Independent of the classification of other pixels. In this 
report a statistical model based on normal mixtures Is proposed which takes 
Into account the organization of LANOSAT agricultural data Into fields 
which are homogeneous as to crop type. Likelihood equations for the 
parameters of the model are derived which may be solved Iteratively as In 
UHMLE. 

2. The Model 

We assume that the data elements (pixel data vectors) are real n-vectors 

each from one of the statistical populations Ilj, •••. with n-varlate 

density functions p(x|llj^), t ■ 1, .... m. We assume that the data Is 

organized Into sets (fields) Fj Fp, where Fj has Nj data elements 

which have been previously ordered In some arbitrary fashion so that the 

•^11 

data elements In Fj form a nN.j -dimensional vector denoted by 

^jNj 

Define random variables .... m}|j»l, .... p; k*l; .... Nj) by If 

and only If Is from Fl^. We assume that all the observations from Fj are 
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2 . 


from the same class, so that we may write « 0j for all j ■ 1, 

.... p; k, t«l, Nj. Finally, we assume that (y^, e^), ..., (ATp,6p) are 

Independent, that the S^'s are Identically distributed, that 

m 

a. ■ Prob rSj = t] > 0 and that E a. ■ 1. Under the stated assumptions, 

P Ml 

the joint density of Jf, , ..., X. Is p(x, , ..., x_) ■ n E a. p (xi), 

1 P 1 P j-1 t«l J 

where Pj^(Xj) = P(^ji» •••» ’‘jNjl®j “ joint density of the elements of 

Fj given that Fj represents class 


Let N » Nj + + Np and for each t let denote the total number 

of the N observations which come from class The following 

proposition shows that with reasonable restrictions on the field sized Nj 
the values of £ » 1, ..., m) can be Inferred from a knowledge of the 
parameters a^. Thus, acreage estimates of the classes can be derived 
from estimates of the parameters 


Proposition 1; (a) E(M^) » 

(b) -jp -► In probability as p « If and only If 

1 P 2 

11m E Nf « 0. 

p-«» N j=l ^ 

aa 2 

(c) If >: ^j/ o ' 

j=l / j"^ 

P N. 

Proof: (a) Write M. » E E-^ x« (041,) 

^ j=l k«=l - 

' ^ N. Xo (64) 

j.l j ^ J 


almost surely. 


r 


3 . 


1 r » Jl 

*t(r) -0 r . t 


Then E(Mj^) - 2 Nj E(Xj^(0j)) • Z ■ Na^. 


(b) Since 


M, 




bounded. It converges to zero 


In probability Iff var -j;p 0 as p Since the terms NjX^(0j) are 
Independent, 


var 


/ M,\ 1 p 2 ^ P 2 

I )“ — ? ^ (Xp( 04 )) ■ — j- Z Ni aj(l-a.). 

\H J ^ J j=l J ^ 


The conclusion follows. 


(c) The assertion follows Immediately from Kolmogorov's version of the 
strong law of large numbers F3I. 

3. tiaximum Likelihood Estimation of the Parameters 

In this section we suppose that the class conditional densities 
p(x|n^) of the data elements are n-varlate normal N(x:u^.Zj^) and that 

{ATjic.* k>l Nj) are class conditionally Independent; I.e., that 

'•j 


'kill 


*p. 


for j»l, 


* • • f 


p. In this case the Joint density of .... 


4 . 


m 


p(x, ..... x„) » n Z a 
^ P j-1 4«1 


k-1 




is parametrized by {(o^, .... m) where a 0. Z « 1, 

and Zj^ is a real n x n positive definite symmetric matrix. Whenever 
a density is evaluated using estimates of its parameters, we denote it. 
e.g.. by $(xj, .... Xp). By a maximum likelihood estimate (MLE) of the 
parameters {(o^^. y^, Zj^)} we mean an element {(o^, y^j^, .... m} 

of the parameter set which locally maximizes .... Xp). By arguments 

similar to those used in [21. the following necessary conditions for a HLE 
are derived. 


1) 


2 ) 


3) 


A 


A 

z« 


p 

Z 

j=l 


p 

z 

j=l 


p 

z 

j=l 




£ 1 


6 f ,. \ j 


P(^j) 


/? 


with equality 
when a, > 0 






pUj) k-1 *' 




euj) 


1 "i 

In equation (2) X. » Z'* X^^ is the mean of the jth field observotions, 

J Nj jx 


By multiplying (1) by we obtain 

1 P 


4 ) 


A 

a„ 


i 

P 


P 

Z 


which* together with (2) and (3) suggests an Iterative procedure for 
solution of the likelihood equations (2) - (4) analogous to that used 
In UHMLE [2]. However, the likelihood equations can be considerably 
simplified by observing that the sequence (Xy S^, (7^, Sp), Is a 

sufficient statistic for the model, where Sj Is the sample scatter matrix 
of the jth field: 


J - T 

h ■ kS, ‘"Jk ■ • 

Equation (3) may be rewritten 


5) 


A 


p 

i: 


iidiii 


/- 


e(/j) 


+ 









p(*j) 


The sufficiency of {Tj, SjOj., 


P(Tj) 


where (y,, Sj) 
represents class 


Is the estimated joint density of X. and given that F. 

A- "’aa- ^ ^ 

l and q(Yj, Sj) = T. «^qj(^j. Sj). The joint density 

1 


K 
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Sj) may be expressed as 

Sj) . N„ (7ji Cj. 4j J,) H„(Sji Nj-1 . rp 
where (7.; 0^, Z^) is the r-variate normal density of 7. and 

** A * J 

Wn(Sj; Nj^j, Z^^) is the Wishart density of Sj with Nj-l degrees of freedom f31. 
Thus the likelihood equations may be written as 


6 ) 


7 ) 


8) 


A . -L f 

^ ^ j*l ^(Xy Sj) 


A 


P _N^q^(_7^. Jjj „ 


j«l q(Ij. Sj) 


j 


y. . S! A^lhiJA 

* j*l o(7j, Sj) 


/ ' 
/ 


1 q(7j, Sj) 

Njqe(?jj ^ 

q(7,. SJ 

j j 


p 

♦ l 

j-1 


/J- A \T / P J1 

q(^j, Sj) / j“l q{^j. Sj) 


jqfc(^j. Sj) 


Equations (6) • (8) are to be used as the basis of the iteration procedure. 
Indeed when each Nj ■ 1 they reduce to the likelihood equations employed 


in UHM'.E. 


4 . C oncluding Remarks . 


The questions of the existence of a consistent HIE as p 


a 


and the 


7 . 


local cofivtrgtfica of tht Utrativt procodurt will bt addrtssod In a future 
report. Me reaiark that the standard consistency results of Cramer, Chanda, 
and Maid (see r21 for references) are not directly applicable since the 
(7j, Sj) are not Identically distributed. Numerical results will also 
be reported at a later date. 
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Abstract 


General theorems concerning the strong consistency of the MLE of ex- 
ponential mixture parameters are proved. These theorems imply the strong 
consistency of the MLE of normal mixture parameters when the data is or- 
ganized into “fields" each of which is a random sample from one of the com- 
ponent normal distributions 


1. Introduction 


In C 5 ] a statistical model for LANDSAT agricultural data based on normal 
mixtures was Introduced which admits a specific kind of dependence among the 
observations* namely their association Into fields each representing a single 
agricultural class. Necessary conditions were derived for a maximum likeli- 
hood estimate of the parameters of the model and a numerical procedure for 
solution of the likelihood equations was suggested. The question of the 
consistency of the maximum likelihood estimate Is complicated by the fact 
that It Is no longer possible to reduce the sample to a set of Independent 
Identically distributed variables. The purpose of this note Is to establish 
a general theorem on the existence of a consistent maximum likelihood estimate 
when the observations are not Identically distributed and to show Its applica- 
bility to the statlstiwai n.odel described In detail below. 


We assume that each pixel Is Identified by a pair (j,k) of positive 
Integers, where the first Index j, 1 s j s p, identifies the field containing 
the pixel and the second Index k, 1 s' k ^ NJ. distinguishes It from other 
pixels In the same field. We suppose that the field structure Is predetermined, 
perhaps as part of a spatial clustering algorithm such as AMOEBA. Let 
^jk ‘ be the random vector of spectral measurements from pixel (J,k) and 
let 0j|^ e {l,”‘,m} be an unobserved random variable Indicating Its class 
Index. We assume that the class Indices 0jj, Ojg. fi’om the jth 

field are all the same and denote their common value by 0j. We further 


assume that, conditioned on Oj > t, the measurements X 




. X 


jN: 


are 
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independently distributed as u^t 51^) » the n-varlate normal with 

unknown mean uj and unknown covariance eJ . Let Xj ~ Xj^j) . 

Our final assunvtions are that (Xp 0j), (Xp, Op) are Independent 
and that (Oj) are identically distributed with unknown « Probr.O>tl > 0 . 
Under these assumptions* the joint density of all the observations is 


( 1 ) 




^j 

n 

k»l 


m 

E 


V’^jk* 





where = (x.,, **■, Xjj^j) e . This joint density is parametrized by 


'j '"jl 


m 


{(«£. w^. Ej^)|£*l, ■*■* ml where > 0 ; E * 1; Uj^ € R^; and E^^ is 

a real nxn positive definite symmetric matrix. For convenience, we let 
il» » Tjj)U = 1. denote an arbitrary member of the parameter 

space and ii;° the true value of the parameter. Thus the likelihood function 
corresponding to the sample Xp Xp is 

P m *^j 

(2) L(il/; X,. •••* X„) = n E a. n » (X.. ; u., ^o) • 

* P j=l £=1 k»l " ^ ^ 


For Xj = (Xjp •••. Xjjjj) £ let 


•"j ’ "'j^’^j^ “ Nj *jk 


and 


‘ <‘41= ■ "4><’'4l< ■ "j’' 










n 


3 

bt the mean and scatter matrix respectively of the vectors x<,, x^mj 

Jl JNJ 

nNJ 

where 

( 4 ) 

Let 

( 5 ) ^ ^i) ' 

X 

By ignoring terms which are independent of the parameters we derive the log 
likelihood function 

p 

(6) . , = I log q.(xjij^) 

j=l ^ ^ 

which leads to the following necessary conditions for a local maximum of the 

(7) - (9) are called the 1 i keli hood equations 

j=l q; (,Y; | 


likelihood function. Equations 
for the present model . 


( 7 ) 


= p 




r 

Z exp {- 




exp {-»5 tr CSj + 


Nj(m. - M,)(m. . 


A 










/^r 


(8) M, 


p u^. I^) ^ /p Ue. 


j*i 


j*i 


(9) 


p q.(^.; M , Ep) 

E, = E — J— J 1 5l_ S. 


•«. 


j=l qj(^;|ij^) 


/ P N^gjOTj; P,. 




P N-;gs(^;» Mji Ej) t ! ^ N^q^C^ii Mn» E«) 

j“l <lj(J^jl'^) ^ ^ / J'l <>j(^jl'l') 


2. The General Theorem 


f\ 

Let 0 be an open subset of and let r 0 . Suppose 
is a sequence of independent* random vectors wi*h having Nj,-variate density 
function q^(*|ij)°) with respect to some fixed o-finite measure on 
Suppose the densitites q^(*|'i') defined for each e 0 . Given a 
positive integer p , define a maximum likelihood estimate of to be an 

P 

element e 0 which locally maximizes L„(^^) = E log q^(x’ |ii») . The equation 

P r=l ^ ^ 

D|^Lp(ij<) = 0 will be called the likelihood equation , where the symbol 
denotes the Frechet derivative with respect to iti . 


A number of theorems dealing with the consistency of maximum likelihood 
estimates, under the additional assumption that the .v^'s are identically 
distributed, have been presented in the literature (see for instance Chanda [ 1, 

Cramer i i. 1, and Wald f « 1. ) Extending any of these results to the case of 
nonidentically distributed observations is primarily a matter of finding a 
convenient set of conditions which insures that a law of large numbers can be 
invoked at several points in the proofs. The following theorem is such an 


s 


outgrowth of the proof of strong consistency contained In [.b ], 
Theorem 1 ; Suppose there Is a neighborhood i) of and a X 


U In such that for all ip e Hi x ^ , I.J.k » r 


positive Integers) 


3 q«(xl\(») 


d\|) 


1 


3'l» j 


and 


3’’log q_(xlt|/) 


3(J;^ 3\|;j 


satisfy: 


( 1 ) 


^q^Cxjij') 


3(j», 




(11) 


3,q^(xk) 






(111) 


9 log q^(x|ii() 


* 'ljkr<*> 


where f^^ and f^j^ are x^-integrable on /? and 

(1») Etf,jK,(x^)2] ■/ fljl(^(x)\(x|\|)o)dX^(x) < M 


,Nr 


for all re , where M Is a constant. Suppose also that 

4 


(v) 


31 og q^(T^|'i'°) 


3ij) 


1 




< M 


and 


(v1) E 


1 


q^(Ar^k°)^ 

\ 3'i'^3(i'j / 


M 




null sets 
(the 

exist and 


6 


L 


for all and r t . Finally suppose that 3 « >0 such that 


(vH) V*°) • Eryog q^(A:J»°)7^1og 


for all r r , where the ordering Is the usual one on oxv synmetric matrices. 
Then, It Is almost surely true that, given a sufficiently small neighborhood 


of i|>°; for large p there Is a unique solution of the likelihood equation 


D^Lp(i/;) « 0 In that neighborhood. Furthermore, that solution Is a maximum 


likelihood estimate. 


Remark ; In the proof we make repeated use of the following simple version 
of the strong law of large nqmbers (see Chung [ 3 ]): Let Zj, Z 2 , be 
uncorrelated random variables and suppose the sequence of variances 1var(Zj)}“_j 

1 n 

Is bounded. Then T. (Z, - E(Z.)) 0 a.s. as n -► <« . 

" 1=1 ^ 1 


1 

Proof of the theorem ; Let ^ . By assumption (1) 

K ^ r=l ^ 


E(^p('l'°)) = 0 and by assumption (v) and the strong law, 0 a.s. as 


p -► «. Now consider the vxv matrix D^^p('i'°) whose i jth element Is 


1 P 3Hog q„(ArJij^°) 1 p 

— I L-I = — I 


1 




p r=l 




P r=l q^(Ar^|>l''’) 




1 p 31og Slog q^(Ar^k°) 


r=l 


Dii< 


1 


dij; 


j 


By assumption (1i) the expected value of the first term on the right is zero. 
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-Ip - 

Hence, by assumptions (v) and (vi) (ii< ) ♦ — t J (r ) -► 0 a.s. as 

P p r*l 

€ 

p It follows that with probability 1, for each n In 0 < n < — 

2 

there Is a p^ c so that for p p^ 

. -?nl 

Without loss of generality we can assume 11 Is convex. 


Thus, for ih € U t 


1 P 
— £ 
p r»l 


1 p V 

< £ £ 

p r»l k«l 


3 log 


3 log 






4'l 








dt 


1 p V 

s — £ £ I - Ij/! 


p r«l k«l 


k '*'k 


^jkr^^r^ 


With probability 1, for large p 



P 

£ 

r=l 


^jkr^^r^ 




1 + . 


by assumption (1v). 

It follows that for any particular norms on and on the vxv symmetric 
matrices there Is a constant R such that with probability 1 there Is a 
Pj f- such that for all p ^ p^, and i|> e (1 , 




Thus, there Is a convex neighborhood of such that 

D^£p(») r -nl 

for an ij; t , p > pj . It now follows as in [ r, J that for p ? p, 

-£p is one to one on and that the image under £p of the sphere 
at ip° of small radius 6 contains the sphere pC'/'^)) of 

radius nfi . Since 0 is eventually in there is a unique 

solution of ip(<I') = 0 in Since D^p(\p) is negative definite, 

this solution is a maximum liVelihood estimate. This concludes the proof. 

Theorem 1 shows that by restricting attention to a fixed neighborhood 
of tp° it is possible to speak unambiguously of the unique consistent 
solution of the likelihood equations or, equivalently, of the unique 
consistent MLE of il>° This terminology will be adopted in the next theorem. 


3. Application to Exponential “ixture s 

In this section we apply Theorem 1 to a class of mixture models which 
contains the normal mixture model of Section 1. Referring to the notation of 
that section, we assume that conditioned on 0 - ■ t, the random n- vectors 


‘jl* 


are independent with a common density of exponential type 
j 


( 1 ) 


f(xlij * C(t.) exp |F(x)-' 


with respect to a dominating o-finite measure X where the parameter 
is an arbitrary member of an open subset U of a finite dimensional vector 
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spact V with Inner product <•{•>. We assume also that C It one to one 

and that the support of the measure Induced on U by F and X cotitalns 

an open set. These conditions In^ly that the parameter Is Identifiable 

C 1 J. and any parametrizatlon of the form (1) satisfying them will be called 

a canonical representation of the given family of distributions. 

The joint density, given 0^ ■ t, of Xj ■ ^ 

j 

exponential type; I.e. , for Xj ■ ^ 

J 

(2) ' 'rj('t) «P 

where 


and the representation (2) is also canonical. 

Some useful facts about exponential families are collected in the 
following lenma. For proofs see Barndorff-Nlelsen n J. 


Lemma 1 ; Let (1) be a canonical representation of an exponential family. 

For each t c U let ic(t) ■ - In C(t) ■ In /_exp <t|F(x)> dX(x). Then 

m." 

(1) for each t c U, F(x) has moments of all orders with respect 
to f(x|x); 

(11) k(t) has derivatives of all orders with respect to t, which 
may be obtained by differentiating under the integral sign. In- 
deed ic(t) can be represented as a syimietric k- linear form 
on V which Is a polynomial in the first k moments of F. In 
particular. 


and 


It 


(Hi) D^<(t) • <E^(F)|-> - /„<F(x)l-> f(x|T) dA(x) 


(iv) dJk(t) • cov^(F) - /^<F - E^(F)1«>2 f(x|r) dX(x) , which Is 
positive definite. 

(v) tc(T) Is strictly convex on U. 

(Expressions <o|*>'^ like that in (iv) are meant to denote k-linear forms; 
2 

e.g. <o|‘>‘ denotes the bilinear form b(ntv) ■ <o|n><o|v>.) 

We are now ready to apply Theorem 1 to the mixture model 

p 

(3) q(xU) » IT q.(xj4») 

where 4 / * 

X = (Xj,. . . ,Xp) 

m 

(4) q.(xjti^) « i:*. p.(xji-) 

J w ' J J ^ 

m-1 

and Pj(xj|T^) has the canonical exponential representation given in (2). 


Theorem 2 : If the numbers (N.) in the mixture model (3) are bounded, then 

J 

with probability 1 there is a unique consistent MLE of the parameter 


Proof ; Using Lemma 1 and writing u.(tJ » E (6.) the nonzero derivatives 

J X J 

of qj(Xjk) up to order 2 are: 

(S) 0 qj(«jl») ■ • > ^ - ‘ 
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(6) ‘ " I*---'*" 

^ ‘^a *’j^*jl'^^ ■ “ 1. •••."« - 1 

it> £» 

(8) O^^D^^qjUjI*) . -Pj(Xj|t,) <8j - Mj(t„)|-> . I • 1 x - 1 

(9) B^j(Xjl»)-Vj (Xjhj) {<6j - Mj(T^)!‘>^ - COV^ (Gj)) . 

£ ■ l,...,m . 

Instead of verifying conditions (i) and (ii) of Theorem 1, it is easier 
to recall that they were needed only in order to conclude that the integrals 
of the first and second order derivatives of 9j(Xjk) zero at ■ ii»°. 
This is obvious form (5) - (9). Similarly, using Lemma 1 and the boundedness 
of {Nj} the verification of conditions (iii) - (vi) presents no problem 
more serious than tedium. It remains to verify condition (vii). We may 
write in matrix form as 




" 

m 


• 

m 


• 

• 


0 


*r 

«r 



0 

0 

"r'2. 

w 

L'r 

Cr 

m 


0 

"r4. 


where ’j and are, respectively, the identity operators on 
and v" and 

, . ( • Pr<Vlx„)llPr(ylt|,) - 

£,k«l,. . . ,m-l 

. ■ / PkPr*'rl\)tPr<yl^) ' Pr^Vl^m^ 

£■1,. . . ,m-l 


•s^ - Urt’k*!' 
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^Pr(>’rlT,)p U.|t, l ^ 

VW *r - «r(^))<s^ . m,{,,)|..J 

Kt rt«irii tint tf , ‘‘•‘•I....,*. 

.<T. |F> „ tP«. « fu^t.0,. ,f f „ 

*1 

♦ ,<t.|F>.,^ , '•*'' 

» . ‘ .11 F . U. th.„ 

JrW fiUto 6 . '' 

- - J ofT “ f r " "" 

FO--MCI. r. CoodUioi. (,ii) »,n r* *"** <l»f1nlt. 

r'») '“»""<«< .My from Aero .1 n . 


♦ ^ 




Observ# that 



Pr^^rl^t) 


^(t^) . <r 


^^TrrVii 

I" 


If ' R r ■' 

* »««p1e from f(xh ) th»« tk 

converges to * . then the expression in square brackets 
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k{Tj^) - - <x^ - - k(X|^) - k‘(T|^) • (Tj^ - T,^) 

which is >0 by the strict convexity of tc. Hence, 

P,(jr,|T^) 


Pr(Jfrl\> 


•>0 as -► 
r 


Therefore, 


c 

Pr<VlFj>Pr<^|Fk> 

= c 

PfUrl^n) 



^k 



W"(-T + ' as 

% “k 

-»5 


Given that is from f{x|T|^), N'^ {G^ - converges in distribution 

to a normal random variable Z with mean zero and covariance cov (F). Hence, 

■^k 




converges in distribution to 0 if t k and -^Z if t = k. 

“k 

Let A be any element of V and consider 

N 

CN’’* <G^ - u^(x,^)|A>/ = N‘2 [ jT <FU^ j) - {F)|a>i'^ 

j=l k 

After expanding and taking expectation with respect to Tj^, it will be 
seen that the only nonvanishing terms are those of the form 
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of which there are 


^(27-0(nJ). Thus 


Is bounded as N It follows from a standard theorem on convergence 


of moments I 3,p. 95) that 
E 




n;'' (g, - u,(T^)) 




-► 0 as ®. 
r 


Thus -*• 0. Similar reasoning shows that 


E^(C,) * cov,^(D) 

as N > ». Therefore o(J (i);)) is bounded away from 0 and this concludes 
the proof. 


4. Concluding Remarks . 

Clearly the assumption in Theorem 2 that {N^,} is bounded can be 
weakened. In fact. Theorem 1 could be modified in such a way as to show that 
the MLE of exponential mixture parameters is strongly consistent when 
2 

E ^r/r^ < CO. 

Redner [ 7 ] has shown that when each = 1, a certain numerical 

r 

procedure for obtaining the MLE of exponential mixture parameters is con- 
vergent. The generalization to bounded {N^} should not be difficult, and 
will be addressed in a future report. 
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